Dataset statistics
| Number of variables | 12 |
|---|---|
| Number of observations | 2260507 |
| Missing cells | 1213442 |
| Missing cells (%) | 4.5% |
| Duplicate rows | 5111 |
| Duplicate rows (%) | 0.2% |
| Total size in memory | 207.0 MiB |
| Average record size in memory | 96.0 B |
Variable types
| Categorical | 5 |
|---|---|
| Numeric | 7 |
| Dataset has 5111 (0.2%) duplicate rows | Duplicates |
Start_Time has a high cardinality: 2206603 distinct values | High cardinality |
Weather_Condition has a high cardinality: 116 distinct values | High cardinality |
Precipitation(in) has 1203775 (53.3%) missing values | Missing |
Precipitation(in) is highly skewed (γ1 = 57.07482798) | Skewed |
Start_Time is uniformly distributed | Uniform |
Wind_Speed(mph) has 158060 (7.0%) zeros | Zeros |
Precipitation(in) has 886656 (39.2%) zeros | Zeros |
Reproduction
| Analysis started | 2021-05-14 14:08:40.814660 |
|---|---|
| Analysis finished | 2021-05-14 14:12:06.530764 |
| Duration | 3 minutes and 25.72 seconds |
| Software version | pandas-profiling v2.11.0 |
| Download configuration | config.yaml |
State
Categorical
| Distinct | 49 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 17.2 MiB |
| CA | |
|---|---|
| TX | |
| FL | |
| SC | |
| NC | |
| Other values (44) |
Length
| Max length | 2 |
|---|---|
| Median length | 2 |
| Mean length | 2 |
| Min length | 2 |
Characters and Unicode
| Total characters | 4521014 |
|---|---|
| Distinct characters | 24 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | OH |
|---|---|
| 2nd row | OH |
| 3rd row | OH |
| 4th row | OH |
| 5th row | OH |
| Value | Count | Frequency (%) |
| CA | 380086 | |
| TX | 263977 | 11.7% |
| FL | 194634 | 8.6% |
| SC | 158916 | 7.0% |
| NC | 123051 | 5.4% |
| NY | 113943 | 5.0% |
| PA | 76754 | 3.4% |
| MI | 72226 | 3.2% |
| IL | 64985 | 2.9% |
| GA | 64630 | 2.9% |
| Other values (39) | 747305 |
| Value | Count | Frequency (%) |
| ca | 380086 | |
| tx | 263977 | 11.7% |
| fl | 194634 | 8.6% |
| sc | 158916 | 7.0% |
| nc | 123051 | 5.4% |
| ny | 113943 | 5.0% |
| pa | 76754 | 3.4% |
| mi | 72226 | 3.2% |
| il | 64985 | 2.9% |
| ga | 64630 | 2.9% |
| Other values (39) | 747305 |
Most occurring characters
| Value | Count | Frequency (%) |
| A | 821981 | |
| C | 694125 | |
| N | 430000 | |
| L | 358901 | |
| T | 347709 | |
| X | 263977 | 5.8% |
| M | 204829 | 4.5% |
| F | 194634 | 4.3% |
| I | 190039 | 4.2% |
| S | 166809 | 3.7% |
| Other values (14) | 848010 |
Most occurring categories
| Value | Count | Frequency (%) |
| Uppercase Letter | 4521014 |
Most frequent character per category
| Value | Count | Frequency (%) |
| A | 821981 | |
| C | 694125 | |
| N | 430000 | |
| L | 358901 | |
| T | 347709 | |
| X | 263977 | 5.8% |
| M | 204829 | 4.5% |
| F | 194634 | 4.3% |
| I | 190039 | 4.2% |
| S | 166809 | 3.7% |
| Other values (14) | 848010 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 4521014 |
Most frequent character per script
| Value | Count | Frequency (%) |
| A | 821981 | |
| C | 694125 | |
| N | 430000 | |
| L | 358901 | |
| T | 347709 | |
| X | 263977 | 5.8% |
| M | 204829 | 4.5% |
| F | 194634 | 4.3% |
| I | 190039 | 4.2% |
| S | 166809 | 3.7% |
| Other values (14) | 848010 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 4521014 |
Most frequent character per block
| Value | Count | Frequency (%) |
| A | 821981 | |
| C | 694125 | |
| N | 430000 | |
| L | 358901 | |
| T | 347709 | |
| X | 263977 | 5.8% |
| M | 204829 | 4.5% |
| F | 194634 | 4.3% |
| I | 190039 | 4.2% |
| S | 166809 | 3.7% |
| Other values (14) | 848010 |
| Distinct | 2206603 |
|---|---|
| Distinct (%) | 97.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 17.2 MiB |
| 2018-11-25 01:22:49 | 31 |
|---|---|
| 2018-11-12 00:37:27 | 27 |
| 2016-04-10 08:59:26 | 27 |
| 2017-09-09 09:03:14 | 23 |
| 2017-09-06 15:52:36 | 22 |
| Other values (2206598) |
Length
| Max length | 19 |
|---|---|
| Median length | 19 |
| Mean length | 19 |
| Min length | 19 |
Characters and Unicode
| Total characters | 42949633 |
|---|---|
| Distinct characters | 13 |
| Distinct categories | 4 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 2158939 ? |
|---|---|
| Unique (%) | 95.5% |
Sample
| 1st row | 2016-02-08 06:49:27 |
|---|---|
| 2nd row | 2016-02-08 07:23:34 |
| 3rd row | 2016-02-08 07:39:07 |
| 4th row | 2016-02-08 07:44:26 |
| 5th row | 2016-02-08 07:59:35 |
| Value | Count | Frequency (%) |
| 2018-11-25 01:22:49 | 31 | < 0.1% |
| 2018-11-12 00:37:27 | 27 | < 0.1% |
| 2016-04-10 08:59:26 | 27 | < 0.1% |
| 2017-09-09 09:03:14 | 23 | < 0.1% |
| 2017-09-06 15:52:36 | 22 | < 0.1% |
| 2019-12-17 06:32:11 | 22 | < 0.1% |
| 2016-06-12 10:07:37 | 22 | < 0.1% |
| 2018-03-28 02:09:15 | 21 | < 0.1% |
| 2016-05-21 08:30:42 | 21 | < 0.1% |
| 2020-02-13 06:52:38 | 20 | < 0.1% |
| Other values (2206593) | 2260271 |
| Value | Count | Frequency (%) |
| 2019-11-15 | 2882 | 0.1% |
| 2019-11-22 | 2829 | 0.1% |
| 2019-11-14 | 2791 | 0.1% |
| 2019-11-12 | 2782 | 0.1% |
| 2018-11-09 | 2777 | 0.1% |
| 2018-11-06 | 2724 | 0.1% |
| 2019-10-16 | 2638 | 0.1% |
| 2019-11-20 | 2633 | 0.1% |
| 2019-11-21 | 2626 | 0.1% |
| 2018-11-02 | 2596 | 0.1% |
| Other values (86571) | 4493736 |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 7626939 | |
| 1 | 6561039 | |
| 2 | 5495995 | |
| - | 4521014 | |
| : | 4521014 | |
| 2260507 | 5.3% | |
| 3 | 1835466 | 4.3% |
| 8 | 1825787 | 4.3% |
| 5 | 1798008 | 4.2% |
| 4 | 1760142 | 4.1% |
| Other values (3) | 4743722 |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 31647098 | |
| Dash Punctuation | 4521014 | 10.5% |
| Other Punctuation | 4521014 | 10.5% |
| Space Separator | 2260507 | 5.3% |
Most frequent character per category
| Value | Count | Frequency (%) |
| 0 | 7626939 | |
| 1 | 6561039 | |
| 2 | 5495995 | |
| 3 | 1835466 | 5.8% |
| 8 | 1825787 | 5.8% |
| 5 | 1798008 | 5.7% |
| 4 | 1760142 | 5.6% |
| 9 | 1703715 | 5.4% |
| 7 | 1671164 | 5.3% |
| 6 | 1368843 | 4.3% |
| Value | Count | Frequency (%) |
| - | 4521014 |
| Value | Count | Frequency (%) |
| 2260507 |
| Value | Count | Frequency (%) |
| : | 4521014 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 42949633 |
Most frequent character per script
| Value | Count | Frequency (%) |
| 0 | 7626939 | |
| 1 | 6561039 | |
| 2 | 5495995 | |
| - | 4521014 | |
| : | 4521014 | |
| 2260507 | 5.3% | |
| 3 | 1835466 | 4.3% |
| 8 | 1825787 | 4.3% |
| 5 | 1798008 | 4.2% |
| 4 | 1760142 | 4.1% |
| Other values (3) | 4743722 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 42949633 |
Most frequent character per block
| Value | Count | Frequency (%) |
| 0 | 7626939 | |
| 1 | 6561039 | |
| 2 | 5495995 | |
| - | 4521014 | |
| : | 4521014 | |
| 2260507 | 5.3% | |
| 3 | 1835466 | 4.3% |
| 8 | 1825787 | 4.3% |
| 5 | 1798008 | 4.2% |
| 4 | 1760142 | 4.1% |
| Other values (3) | 4743722 |
Severity
Categorical
| Distinct | 4 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 17.2 MiB |
| 2 | |
|---|---|
| 3 | |
| 4 | 8267 |
| 1 | 946 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Characters and Unicode
| Total characters | 2260507 |
|---|---|
| Distinct characters | 4 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 2 |
|---|---|
| 2nd row | 3 |
| 3rd row | 2 |
| 4th row | 3 |
| 5th row | 2 |
| Value | Count | Frequency (%) |
| 2 | 1505417 | |
| 3 | 745877 | |
| 4 | 8267 | 0.4% |
| 1 | 946 | < 0.1% |
| Value | Count | Frequency (%) |
| 2 | 1505417 | |
| 3 | 745877 | |
| 4 | 8267 | 0.4% |
| 1 | 946 | < 0.1% |
Most occurring characters
| Value | Count | Frequency (%) |
| 2 | 1505417 | |
| 3 | 745877 | |
| 4 | 8267 | 0.4% |
| 1 | 946 | < 0.1% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 2260507 |
Most frequent character per category
| Value | Count | Frequency (%) |
| 2 | 1505417 | |
| 3 | 745877 | |
| 4 | 8267 | 0.4% |
| 1 | 946 | < 0.1% |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 2260507 |
Most frequent character per script
| Value | Count | Frequency (%) |
| 2 | 1505417 | |
| 3 | 745877 | |
| 4 | 8267 | 0.4% |
| 1 | 946 | < 0.1% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 2260507 |
Most frequent character per block
| Value | Count | Frequency (%) |
| 2 | 1505417 | |
| 3 | 745877 | |
| 4 | 8267 | 0.4% |
| 1 | 946 | < 0.1% |
Start_Lng
Real number (ℝ)
| Distinct | 708230 |
|---|---|
| Distinct (%) | 31.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | -92.61955627 |
|---|---|
| Minimum | -124.623833 |
| Maximum | -67.839745 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 17.2 MiB |
Quantile statistics
| Minimum | -124.623833 |
|---|---|
| 5-th percentile | -122.086945 |
| Q1 | -97.785866 |
| median | -86.780106 |
| Q3 | -80.818542 |
| 95-th percentile | -73.8449961 |
| Maximum | -67.839745 |
| Range | 56.784088 |
| Interquartile range (IQR) | 16.967324 |
Descriptive statistics
| Standard deviation | 15.87602625 |
|---|---|
| Coefficient of variation (CV) | -0.1714111672 |
| Kurtosis | -0.8004525872 |
| Mean | -92.61955627 |
| Median Absolute Deviation (MAD) | 8.790725 |
| Skewness | -0.740033074 |
| Sum | -209367155.3 |
| Variance | 252.0482095 |
| Monotocity | Not monotonic |
| Value | Count | Frequency (%) |
| -84.390343 | 482 | < 0.1% |
| -83.111794 | 472 | < 0.1% |
| -122.366852 | 458 | < 0.1% |
| -82.259857 | 439 | < 0.1% |
| -118.096634 | 414 | < 0.1% |
| -83.058128 | 409 | < 0.1% |
| -80.204353 | 381 | < 0.1% |
| -93.26992 | 373 | < 0.1% |
| -82.2603 | 367 | < 0.1% |
| -118.368263 | 354 | < 0.1% |
| Other values (708220) | 2256358 |
| Value | Count | Frequency (%) |
| -124.623833 | 1 | |
| -124.534439 | 1 | |
| -124.484421 | 1 | |
| -124.479179 | 1 | |
| -124.479156 | 1 |
| Value | Count | Frequency (%) |
| -67.839745 | 1 | |
| -67.841858 | 1 | |
| -68.060165 | 1 | |
| -68.14003 | 1 | |
| -68.380852 | 1 |
Start_Lat
Real number (ℝ≥0)
| Distinct | 743384 |
|---|---|
| Distinct (%) | 32.9% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 36.07856652 |
|---|---|
| Minimum | 24.555269 |
| Maximum | 49.002201 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 17.2 MiB |
Quantile statistics
| Minimum | 24.555269 |
|---|---|
| 5-th percentile | 28.12026 |
| Q1 | 32.92453 |
| median | 35.391476 |
| Q3 | 40.068172 |
| 95-th percentile | 43.2429418 |
| Maximum | 49.002201 |
| Range | 24.446932 |
| Interquartile range (IQR) | 7.143642 |
Descriptive statistics
| Standard deviation | 4.932324556 |
|---|---|
| Coefficient of variation (CV) | 0.1367106576 |
| Kurtosis | -0.6066374191 |
| Mean | 36.07856652 |
| Median Absolute Deviation (MAD) | 3.653068 |
| Skewness | 0.08649421934 |
| Sum | 81555852.18 |
| Variance | 24.32782552 |
| Monotocity | Not monotonic |
| Value | Count | Frequency (%) |
| 33.744976 | 483 | < 0.1% |
| 42.476501 | 472 | < 0.1% |
| 37.808498 | 452 | < 0.1% |
| 34.858925 | 438 | < 0.1% |
| 33.941364 | 416 | < 0.1% |
| 42.368423 | 408 | < 0.1% |
| 25.789072 | 380 | < 0.1% |
| 44.966118 | 372 | < 0.1% |
| 34.858795 | 366 | < 0.1% |
| 34.833031 | 349 | < 0.1% |
| Other values (743374) | 2256371 |
| Value | Count | Frequency (%) |
| 24.555269 | 1 | |
| 24.5574 | 1 | |
| 24.55987 | 1 | |
| 24.560246 | 1 | |
| 24.560688 | 1 |
| Value | Count | Frequency (%) |
| 49.002201 | 1 | < 0.1% |
| 49.000759 | 1 | < 0.1% |
| 48.999901 | 1 | < 0.1% |
| 48.999569 | 1 | < 0.1% |
| 48.998241 | 4 |
Humidity(%)
Real number (ℝ≥0)
| Distinct | 100 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 2489 |
| Missing (%) | 0.1% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 65.75183546 |
|---|---|
| Minimum | 1 |
| Maximum | 100 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 17.2 MiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 27 |
| Q1 | 50 |
| median | 68 |
| Q3 | 84 |
| 95-th percentile | 97 |
| Maximum | 100 |
| Range | 99 |
| Interquartile range (IQR) | 34 |
Descriptive statistics
| Standard deviation | 22.05538544 |
|---|---|
| Coefficient of variation (CV) | 0.3354337607 |
| Kurtosis | -0.6871174464 |
| Mean | 65.75183546 |
| Median Absolute Deviation (MAD) | 17 |
| Skewness | -0.3883801916 |
| Sum | 148468828 |
| Variance | 486.440027 |
| Monotocity | Not monotonic |
| Value | Count | Frequency (%) |
| 93 | 86908 | 3.8% |
| 100 | 85379 | 3.8% |
| 90 | 56706 | 2.5% |
| 87 | 54162 | 2.4% |
| 96 | 42259 | 1.9% |
| 84 | 40905 | 1.8% |
| 89 | 40272 | 1.8% |
| 94 | 39609 | 1.8% |
| 81 | 38340 | 1.7% |
| 82 | 37730 | 1.7% |
| Other values (90) | 1735748 |
| Value | Count | Frequency (%) |
| 1 | 1 | < 0.1% |
| 2 | 9 | < 0.1% |
| 3 | 45 | < 0.1% |
| 4 | 465 | |
| 5 | 985 |
| Value | Count | Frequency (%) |
| 100 | 85379 | |
| 99 | 2893 | 0.1% |
| 98 | 1699 | 0.1% |
| 97 | 29125 | 1.3% |
| 96 | 42259 |
Pressure(in)
Real number (ℝ≥0)
| Distinct | 455 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 29.81763357 |
|---|---|
| Minimum | 26.51 |
| Maximum | 31.15 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 17.2 MiB |
Quantile statistics
| Minimum | 26.51 |
|---|---|
| 5-th percentile | 28.93 |
| Q1 | 29.7 |
| median | 29.94 |
| Q3 | 30.08 |
| 95-th percentile | 30.32 |
| Maximum | 31.15 |
| Range | 4.64 |
| Interquartile range (IQR) | 0.38 |
Descriptive statistics
| Standard deviation | 0.451540204 |
|---|---|
| Coefficient of variation (CV) | 0.01514339503 |
| Kurtosis | 5.726885162 |
| Mean | 29.81763357 |
| Median Absolute Deviation (MAD) | 0.17 |
| Skewness | -1.859723004 |
| Sum | 67402969.41 |
| Variance | 0.2038885558 |
| Monotocity | Not monotonic |
| Value | Count | Frequency (%) |
| 29.96 | 45266 | 2.0% |
| 30.01 | 44905 | 2.0% |
| 29.99 | 44680 | 2.0% |
| 29.94 | 43410 | 1.9% |
| 30.04 | 42473 | 1.9% |
| 30.06 | 40576 | 1.8% |
| 29.91 | 40384 | 1.8% |
| 30.03 | 38301 | 1.7% |
| 29.97 | 38180 | 1.7% |
| 29.98 | 38060 | 1.7% |
| Other values (445) | 1844272 |
| Value | Count | Frequency (%) |
| 26.51 | 8 | |
| 26.52 | 11 | |
| 26.53 | 18 | |
| 26.54 | 11 | |
| 26.55 | 14 |
| Value | Count | Frequency (%) |
| 31.15 | 1 | < 0.1% |
| 31.12 | 2 | |
| 31.1 | 1 | < 0.1% |
| 31.08 | 3 | |
| 31.03 | 2 |
Temperature(F)
Real number (ℝ)
| Distinct | 791 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 63.17708235 |
|---|---|
| Minimum | -33 |
| Maximum | 129.2 |
| Zeros | 432 |
| Zeros (%) | < 0.1% |
| Memory size | 17.2 MiB |
Quantile statistics
| Minimum | -33 |
|---|---|
| 5-th percentile | 30 |
| Q1 | 51.1 |
| median | 66 |
| Q3 | 77 |
| 95-th percentile | 89.6 |
| Maximum | 129.2 |
| Range | 162.2 |
| Interquartile range (IQR) | 25.9 |
Descriptive statistics
| Standard deviation | 18.64594592 |
|---|---|
| Coefficient of variation (CV) | 0.295137813 |
| Kurtosis | -0.008301761303 |
| Mean | 63.17708235 |
| Median Absolute Deviation (MAD) | 12.9 |
| Skewness | -0.5398998588 |
| Sum | 142812236.9 |
| Variance | 347.6712992 |
| Monotocity | Not monotonic |
| Value | Count | Frequency (%) |
| 77 | 54259 | 2.4% |
| 68 | 50786 | 2.2% |
| 73 | 49213 | 2.2% |
| 75 | 46129 | 2.0% |
| 72 | 45406 | 2.0% |
| 70 | 44082 | 2.0% |
| 59 | 43525 | 1.9% |
| 79 | 42277 | 1.9% |
| 63 | 41238 | 1.8% |
| 64 | 40816 | 1.8% |
| Other values (781) | 1802776 |
| Value | Count | Frequency (%) |
| -33 | 1 | < 0.1% |
| -29 | 2 | < 0.1% |
| -27.9 | 10 | |
| -27.4 | 2 | < 0.1% |
| -27 | 8 |
| Value | Count | Frequency (%) |
| 129.2 | 2 | < 0.1% |
| 127 | 1 | < 0.1% |
| 123.8 | 1 | < 0.1% |
| 123 | 2 | < 0.1% |
| 122 | 5 |
Wind_Direction
Categorical
| Distinct | 18 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 18 |
| Missing (%) | < 0.1% |
| Memory size | 17.2 MiB |
| S | |
|---|---|
| W | |
| CALM | |
| N | |
| SSW | 133831 |
| Other values (13) |
Length
| Max length | 4 |
|---|---|
| Median length | 3 |
| Mean length | 2.288286296 |
| Min length | 1 |
Characters and Unicode
| Total characters | 5172646 |
|---|---|
| Distinct characters | 10 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | SW |
|---|---|
| 2nd row | SW |
| 3rd row | SW |
| 4th row | SSW |
| 5th row | WSW |
| Value | Count | Frequency (%) |
| S | 211943 | 9.4% |
| W | 178849 | 7.9% |
| CALM | 158059 | 7.0% |
| N | 151502 | 6.7% |
| SSW | 133831 | 5.9% |
| VAR | 133116 | 5.9% |
| SW | 127078 | 5.6% |
| E | 123720 | 5.5% |
| SSE | 122850 | 5.4% |
| WNW | 121172 | 5.4% |
| Other values (8) | 798369 |
| Value | Count | Frequency (%) |
| s | 211943 | 9.4% |
| w | 178849 | 7.9% |
| calm | 158059 | 7.0% |
| n | 151502 | 6.7% |
| ssw | 133831 | 5.9% |
| var | 133116 | 5.9% |
| sw | 127078 | 5.6% |
| e | 123720 | 5.5% |
| sse | 122850 | 5.4% |
| wnw | 121172 | 5.4% |
| Other values (8) | 798369 |
Most occurring characters
| Value | Count | Frequency (%) |
| S | 1152969 | |
| W | 1139061 | |
| N | 970519 | |
| E | 878513 | |
| A | 291175 | 5.6% |
| C | 158059 | 3.1% |
| L | 158059 | 3.1% |
| M | 158059 | 3.1% |
| V | 133116 | 2.6% |
| R | 133116 | 2.6% |
Most occurring categories
| Value | Count | Frequency (%) |
| Uppercase Letter | 5172646 |
Most frequent character per category
| Value | Count | Frequency (%) |
| S | 1152969 | |
| W | 1139061 | |
| N | 970519 | |
| E | 878513 | |
| A | 291175 | 5.6% |
| C | 158059 | 3.1% |
| L | 158059 | 3.1% |
| M | 158059 | 3.1% |
| V | 133116 | 2.6% |
| R | 133116 | 2.6% |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 5172646 |
Most frequent character per script
| Value | Count | Frequency (%) |
| S | 1152969 | |
| W | 1139061 | |
| N | 970519 | |
| E | 878513 | |
| A | 291175 | 5.6% |
| C | 158059 | 3.1% |
| L | 158059 | 3.1% |
| M | 158059 | 3.1% |
| V | 133116 | 2.6% |
| R | 133116 | 2.6% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 5172646 |
Most frequent character per block
| Value | Count | Frequency (%) |
| S | 1152969 | |
| W | 1139061 | |
| N | 970519 | |
| E | 878513 | |
| A | 291175 | 5.6% |
| C | 158059 | 3.1% |
| L | 158059 | 3.1% |
| M | 158059 | 3.1% |
| V | 133116 | 2.6% |
| R | 133116 | 2.6% |
| Distinct | 125 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 8.098481978 |
|---|---|
| Minimum | 0 |
| Maximum | 175 |
| Zeros | 158060 |
| Zeros (%) | 7.0% |
| Memory size | 17.2 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 4.6 |
| median | 7 |
| Q3 | 10.4 |
| 95-th percentile | 17 |
| Maximum | 175 |
| Range | 175 |
| Interquartile range (IQR) | 5.8 |
Descriptive statistics
| Standard deviation | 4.869992634 |
|---|---|
| Coefficient of variation (CV) | 0.6013463569 |
| Kurtosis | 10.02188099 |
| Mean | 8.098481978 |
| Median Absolute Deviation (MAD) | 2.4 |
| Skewness | 1.167952981 |
| Sum | 18306675.2 |
| Variance | 23.71682825 |
| Monotocity | Not monotonic |
| Value | Count | Frequency (%) |
| 4.6 | 167730 | 7.4% |
| 5.8 | 165738 | 7.3% |
| 0 | 158060 | 7.0% |
| 3.5 | 156734 | 6.9% |
| 6.9 | 153797 | 6.8% |
| 8.1 | 137381 | 6.1% |
| 9.2 | 121718 | 5.4% |
| 10.4 | 99603 | 4.4% |
| 5 | 94043 | 4.2% |
| 6 | 91210 | 4.0% |
| Other values (115) | 914493 |
| Value | Count | Frequency (%) |
| 0 | 158060 | |
| 1 | 72 | < 0.1% |
| 1.2 | 319 | < 0.1% |
| 2 | 155 | < 0.1% |
| 2.3 | 638 | < 0.1% |
| Value | Count | Frequency (%) |
| 175 | 3 | |
| 174.9 | 1 | < 0.1% |
| 162.3 | 2 | |
| 161 | 1 | < 0.1% |
| 157 | 1 | < 0.1% |
| Distinct | 251 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 1203775 |
| Missing (%) | 53.3% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.01473804143 |
|---|---|
| Minimum | 0 |
| Maximum | 25 |
| Zeros | 886656 |
| Zeros (%) | 39.2% |
| Memory size | 17.2 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 0 |
| 95-th percentile | 0.07 |
| Maximum | 25 |
| Range | 25 |
| Interquartile range (IQR) | 0 |
Descriptive statistics
| Standard deviation | 0.1540297482 |
|---|---|
| Coefficient of variation (CV) | 10.45116808 |
| Kurtosis | 4030.039281 |
| Mean | 0.01473804143 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 57.07482798 |
| Sum | 15574.16 |
| Variance | 0.02372516334 |
| Monotocity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 886656 | |
| 0.01 | 47242 | 2.1% |
| 0.02 | 23404 | 1.0% |
| 0.03 | 16045 | 0.7% |
| 0.04 | 11781 | 0.5% |
| 0.05 | 9587 | 0.4% |
| 0.06 | 7637 | 0.3% |
| 0.07 | 6287 | 0.3% |
| 0.08 | 5020 | 0.2% |
| 0.09 | 4491 | 0.2% |
| Other values (241) | 38582 | 1.7% |
| (Missing) | 1203775 |
| Value | Count | Frequency (%) |
| 0 | 886656 | |
| 0.01 | 47242 | 2.1% |
| 0.02 | 23404 | 1.0% |
| 0.03 | 16045 | 0.7% |
| 0.04 | 11781 | 0.5% |
| Value | Count | Frequency (%) |
| 25 | 1 | |
| 10.8 | 1 | |
| 10.14 | 2 | |
| 10.13 | 1 | |
| 10.11 | 1 |
| Distinct | 116 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 7160 |
| Missing (%) | 0.3% |
| Memory size | 17.2 MiB |
| Clear | |
|---|---|
| Fair | |
| Mostly Cloudy | |
| Overcast | |
| Partly Cloudy | |
| Other values (111) |
Length
| Max length | 35 |
|---|---|
| Median length | 8 |
| Mean length | 8.451608651 |
| Min length | 3 |
Characters and Unicode
| Total characters | 19044407 |
|---|---|
| Distinct characters | 45 |
| Distinct categories | 5 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 9 ? |
|---|---|
| Unique (%) | < 0.1% |
Sample
| 1st row | Overcast |
|---|---|
| 2nd row | Mostly Cloudy |
| 3rd row | Mostly Cloudy |
| 4th row | Light Rain |
| 5th row | Overcast |
| Value | Count | Frequency (%) |
| Clear | 470336 | |
| Fair | 395893 | |
| Mostly Cloudy | 332231 | |
| Overcast | 245635 | |
| Partly Cloudy | 228108 | |
| Cloudy | 151073 | 6.7% |
| Scattered Clouds | 133415 | 5.9% |
| Light Rain | 120933 | 5.3% |
| Light Snow | 30993 | 1.4% |
| Rain | 27883 | 1.2% |
| Other values (106) | 116847 | 5.2% |
| Value | Count | Frequency (%) |
| cloudy | 718172 | |
| clear | 470336 | |
| fair | 400038 | |
| mostly | 334760 | |
| overcast | 245635 | 7.7% |
| partly | 229596 | 7.2% |
| rain | 173341 | 5.4% |
| light | 172045 | 5.4% |
| scattered | 133415 | 4.2% |
| clouds | 133415 | 4.2% |
| Other values (46) | 194758 | 6.1% |
Most occurring characters
| Value | Count | Frequency (%) |
| l | 1900400 | 10.0% |
| a | 1699260 | 8.9% |
| r | 1528578 | 8.0% |
| C | 1321934 | 6.9% |
| y | 1317857 | 6.9% |
| t | 1278045 | 6.7% |
| o | 1267582 | 6.7% |
| e | 1064542 | 5.6% |
| d | 1024699 | 5.4% |
| 952164 | 5.0% | |
| Other values (35) | 5689346 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 14892062 | |
| Uppercase Letter | 3179695 | 16.7% |
| Space Separator | 952164 | 5.0% |
| Other Punctuation | 14752 | 0.1% |
| Dash Punctuation | 5734 | < 0.1% |
Most frequent character per category
| Value | Count | Frequency (%) |
| l | 1900400 | |
| a | 1699260 | |
| r | 1528578 | |
| y | 1317857 | |
| t | 1278045 | |
| o | 1267582 | |
| e | 1064542 | |
| d | 1024699 | |
| u | 869742 | |
| i | 793586 | 5.3% |
| Other values (14) | 2147771 |
| Value | Count | Frequency (%) |
| C | 1321934 | |
| F | 425068 | 13.4% |
| M | 337152 | 10.6% |
| O | 245635 | 7.7% |
| P | 231433 | 7.3% |
| S | 180301 | 5.7% |
| R | 173341 | 5.5% |
| L | 172049 | 5.4% |
| H | 37421 | 1.2% |
| T | 23668 | 0.7% |
| Other values (8) | 31693 | 1.0% |
| Value | Count | Frequency (%) |
| 952164 |
| Value | Count | Frequency (%) |
| / | 14752 |
| Value | Count | Frequency (%) |
| - | 5734 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 18071757 | |
| Common | 972650 | 5.1% |
Most frequent character per script
| Value | Count | Frequency (%) |
| l | 1900400 | 10.5% |
| a | 1699260 | 9.4% |
| r | 1528578 | 8.5% |
| C | 1321934 | 7.3% |
| y | 1317857 | 7.3% |
| t | 1278045 | 7.1% |
| o | 1267582 | 7.0% |
| e | 1064542 | 5.9% |
| d | 1024699 | 5.7% |
| u | 869742 | 4.8% |
| Other values (32) | 4799118 |
| Value | Count | Frequency (%) |
| 952164 | ||
| / | 14752 | 1.5% |
| - | 5734 | 0.6% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 19044407 |
Most frequent character per block
| Value | Count | Frequency (%) |
| l | 1900400 | 10.0% |
| a | 1699260 | 8.9% |
| r | 1528578 | 8.0% |
| C | 1321934 | 6.9% |
| y | 1317857 | 6.9% |
| t | 1278045 | 6.7% |
| o | 1267582 | 6.7% |
| e | 1064542 | 5.6% |
| d | 1024699 | 5.4% |
| 952164 | 5.0% | |
| Other values (35) | 5689346 |
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.Cramér's V (φc)
Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.First rows
| State | Start_Time | Severity | Start_Lng | Start_Lat | Humidity(%) | Pressure(in) | Temperature(F) | Wind_Direction | Wind_Speed(mph) | Precipitation(in) | Weather_Condition | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | OH | 2016-02-08 06:49:27 | 2 | -84.032608 | 39.063148 | 100.0 | 29.67 | 36.0 | SW | 3.5 | NaN | Overcast |
| 1 | OH | 2016-02-08 07:23:34 | 3 | -84.205582 | 39.747753 | 96.0 | 29.64 | 35.1 | SW | 4.6 | NaN | Mostly Cloudy |
| 2 | OH | 2016-02-08 07:39:07 | 2 | -84.188354 | 39.627781 | 89.0 | 29.65 | 36.0 | SW | 3.5 | NaN | Mostly Cloudy |
| 3 | OH | 2016-02-08 07:44:26 | 3 | -82.925194 | 40.100590 | 97.0 | 29.63 | 37.9 | SSW | 3.5 | 0.03 | Light Rain |
| 4 | OH | 2016-02-08 07:59:35 | 2 | -84.230507 | 39.758274 | 100.0 | 29.66 | 34.0 | WSW | 3.5 | NaN | Overcast |
| 5 | OH | 2016-02-08 07:59:58 | 3 | -84.194901 | 39.770382 | 100.0 | 29.66 | 34.0 | WSW | 3.5 | NaN | Overcast |
| 6 | OH | 2016-02-08 08:00:40 | 2 | -84.172005 | 39.778061 | 99.0 | 29.67 | 33.3 | SW | 1.2 | NaN | Mostly Cloudy |
| 7 | OH | 2016-02-08 08:10:04 | 3 | -82.925194 | 40.100590 | 100.0 | 29.62 | 37.4 | SSW | 4.6 | 0.02 | Light Rain |
| 8 | OH | 2016-02-08 08:14:42 | 3 | -83.119293 | 39.952812 | 93.0 | 29.64 | 35.6 | WNW | 5.8 | NaN | Rain |
| 9 | OH | 2016-02-08 08:21:27 | 3 | -82.830910 | 39.932709 | 100.0 | 29.62 | 37.4 | SSW | 4.6 | 0.02 | Light Rain |
Last rows
| State | Start_Time | Severity | Start_Lng | Start_Lat | Humidity(%) | Pressure(in) | Temperature(F) | Wind_Direction | Wind_Speed(mph) | Precipitation(in) | Weather_Condition | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2260497 | CA | 2017-08-30 17:32:09 | 3 | -118.046837 | 33.774685 | 47.0 | 29.68 | 87.6 | S | 5.8 | NaN | Partly Cloudy |
| 2260498 | CA | 2017-08-30 17:31:39 | 2 | -117.906784 | 33.853939 | 34.0 | 29.66 | 93.0 | VAR | 3.5 | NaN | Clear |
| 2260499 | CA | 2017-08-30 17:54:40 | 2 | -118.233269 | 34.073830 | 40.0 | 29.67 | 88.0 | VAR | 4.6 | NaN | Clear |
| 2260500 | CA | 2017-08-30 18:04:19 | 3 | -117.938385 | 34.072350 | 27.0 | 29.69 | 98.6 | SSW | 6.9 | NaN | Partly Cloudy |
| 2260501 | CA | 2017-08-30 18:28:48 | 2 | -118.535988 | 34.173161 | 18.0 | 29.66 | 100.0 | WNW | 4.6 | NaN | Clear |
| 2260502 | CA | 2017-08-30 18:41:30 | 3 | -118.623932 | 34.495808 | 18.0 | 28.85 | 100.0 | WNW | 5.0 | 0.0 | Fair |
| 2260503 | CA | 2017-08-30 18:59:02 | 3 | -118.433723 | 34.031322 | 64.0 | 29.69 | 77.0 | SSW | 5.8 | NaN | Clear |
| 2260504 | CA | 2017-08-30 18:57:52 | 3 | -117.369102 | 34.106785 | 16.0 | 29.73 | 102.2 | SSW | 5.8 | NaN | Haze |
| 2260505 | CA | 2017-08-30 19:49:01 | 3 | -118.103981 | 33.924686 | 39.0 | 29.68 | 88.0 | W | 3.5 | NaN | Clear |
| 2260506 | CA | 2017-08-30 20:17:21 | 2 | -117.397354 | 33.729469 | 40.0 | 29.78 | 89.6 | S | 3.5 | NaN | Clear |
Most frequent
| State | Start_Time | Severity | Start_Lng | Start_Lat | Humidity(%) | Pressure(in) | Temperature(F) | Wind_Direction | Wind_Speed(mph) | Precipitation(in) | Weather_Condition | count | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 790 | FL | 2020-11-12 06:28:16 | 3 | -80.187675 | 25.942879 | 100.0 | 29.91 | 78.0 | S | 5.0 | 0.00 | Partly Cloudy | 11 |
| 2152 | SC | 2018-09-16 13:24:13 | 3 | -81.195084 | 33.978249 | 100.0 | 29.74 | 75.0 | SE | 13.8 | 0.01 | Rain | 11 |
| 890 | GA | 2020-03-12 22:33:35 | 3 | -84.513145 | 33.585026 | 65.0 | 28.87 | 69.0 | SSW | 9.0 | 0.00 | Partly Cloudy | 10 |
| 2151 | SC | 2018-09-16 13:24:12 | 3 | -81.195084 | 33.978249 | 100.0 | 29.74 | 75.0 | SE | 13.8 | 0.01 | Rain | 8 |
| 2601 | TX | 2019-09-16 15:09:55 | 3 | -96.897278 | 32.907650 | 34.0 | 29.46 | 95.0 | E | 9.0 | 0.00 | Partly Cloudy | 7 |
| 3027 | WA | 2020-08-14 07:43:26 | 2 | -117.467613 | 47.673512 | 45.0 | 27.67 | 56.0 | SE | 8.0 | 0.00 | Fair | 7 |
| 1425 | MO | 2020-10-27 06:54:07 | 2 | -90.284042 | 38.713409 | 86.0 | 29.65 | 37.0 | NNE | 7.0 | 0.00 | Light Rain | 5 |
| 1404 | MO | 2020-04-10 19:53:37 | 3 | -94.529579 | 38.843868 | 37.0 | 28.75 | 52.0 | SSE | 9.0 | 0.00 | Cloudy | 4 |
| 2284 | SC | 2020-04-03 16:13:52 | 2 | -79.016426 | 33.974316 | 23.0 | 29.82 | 72.0 | W | 6.0 | 0.00 | Fair | 4 |
| 2738 | TX | 2020-06-09 09:38:20 | 3 | -95.265442 | 29.768175 | 69.0 | 29.73 | 87.0 | SSW | 9.0 | 0.00 | Fair | 4 |